[WIP] Iam_ocr Adding Wellington corpus by ChunChiehChang · Pull Request #2354 · kaldi-asr/kaldi

ChunChiehChang · 2018-04-12T21:24:46Z

I haven't updated the new results on the top of the scripts yet.

merging the files from the IAM_eng branch to use that file structure

…keep consistent with aarora8/IAM_eng branch

Keeping up to date with code written by aarora8

…te spk2utt file

…actly

…le-diag-gmm.h instead of gmm-est.cc

…iphone modelling

…amed run.sh to run{1-6}.sh

…ting instead of lines of text

…ion.

…B overlap

…tuations

… the LOB corpus entirely

danpovey · 2018-04-12T23:44:41Z

egs/iam/v1/run.sh

-  steps/decode.sh --nj $nj --cmd $cmd exp/mono/graph data/test \
-    exp/mono/decode_test
-fi
+#if [ $stage -le 5 ]; then


are these commented-out parts supposed to be commented out?

Yes, I intentionally commented out the decoding parts because the results from the first couple of stages (monophone/triphone) are not very good and I didn't think I needed to compute them. I can just pass their alignments on to the later CNN stages of the script.

Rather than comment them out, you can set at the top
decode_gmm=false
and do things like
if [ $stage -le 5 ] && $decode_gmm; then

aarora8 · 2018-04-13T00:09:07Z

egs/iam/v1/local/remove_utterances_from_corpus.py

@@ -0,0 +1,138 @@
+#!/usr/bin/env python3


A cleaned version of this file './local/remove_test_utterances_from_lob.py' is already part of kaldi/master : kaldi/egs/iam/v1.

danpovey · 2018-04-13T00:36:59Z

I am thinking of merging this within 24 hours-- I want to be a bit more aggressive about merging these recipes as I think it will help our work move faster.
@hhadian do you have time to review in a bit more detail? If you are busy it's OK, I can merge and we can address things as we find them.

danpovey · 2018-04-13T00:37:27Z

oh wait, I see this PR is very small. Merging.

Chun-Chieh Chang and others added 30 commits July 21, 2017 13:33

added basic file structure

22ea467

initial commit for preperation of IAM dataset

c011a3f

added scripts to extract features from image

296d112

Merge remote-tracking branch 'aarora8/IAM_eng' into iam_ocr

f6d6727

merging the files from the IAM_eng branch to use that file structure

removed egs/iam directory and moved necessary files to egs/iam_en to …

b5fbd16

…keep consistent with aarora8/IAM_eng branch

adding code for writing to text, wav.scp and utt2spk file

f4ca585

Merge remote-tracking branch 'aarora8/IAM_eng' into iam_ocr

82c8596

Keeping up to date with code written by aarora8

added code to perform splits

cdbbefb

adding code for creating train,validation and test sets and also crea…

b787cfe

…te spk2utt file

changed so train, test, valid examples are separated

910d710

merged aarora8/IAM_eng and fixed some minor things

8ef99d6

accidently split spk2utt incorrectly

696b21c

adding changes for setting desired variance floor value

dd208c3

cosmetic fix

3911054

reading lines.txt and creating a text dictionary

4938413

using ascii/lines files instead of xml

170f627

WIP added scripts to prepare dict and lexicon

aaf9dce

fixed bug where due to rounding errors the images won't be resized ex…

2e1b600

…actly

fixing bug in initializing variance floor vector

3bcc639

fixing bug in initializing variance floor vector creating vector in m…

2159a85

…le-diag-gmm.h instead of gmm-est.cc

added bugfixes that aarora8/IAM_eng made

99173f8

adding code for preparing character based language model

1549c8b

adding files for character based language model

f455eff

added decoding to run.sh and the necessary files for decoding into local

2f7f7d9

adding decoding changes from ChunChiehChang and adding changes for tr…

00f299c

…iphone modelling

minor changes so that testing scripts will be easier

f748891

added scripts to use characters instead of words for models. Also ren…

19e2e79

…amed run.sh to run{1-6}.sh

added in comments the ability to use individual words in training/tes…

424bb2e

…ting instead of lines of text

changes to creating lexicon

32b7030

added scripts to ensure lm decodes one word without uniform distribut…

b591fd3

…ion.

Chun-Chieh Chang and others added 22 commits January 12, 2018 17:08

removed commented out code

44d3ce2

adding updated results

d873784

removed unnecessary folder

e52df3c

removed some changes from variance floor option that is now removed

bbc7b4c

moved s5 to v1

018600c

merge master branch to get hhadian commits

ab5a51c

moved s5 to wrong location

7e1a8a2

removed unused files

f82aaa4

changed run to use local prepare_lang.sh

26bf5b4

Merge remote-tracking branch 'origin/master' into iam_ocr

bb6a073

added unk and added wellington corpus. removed LOB because IAM and LO…

340de0c

…B overlap

forgot to add some stuff for unk and also for different topo for punc…

048b2e5

…tuations

Add initial scripts for e2e ocr - not cleaned

52c6721

Add e2e chain script

92a5866

Some fixes

ea839ad

Some cleaning

f5cbb24

removed the test words from LOB corpus. Previous commits just removed…

aa6f698

… the LOB corpus entirely

Merge remote-tracking branch 'hhadian/e2e_ocr' into iam_ocr

d7aa22b

merged upstream master

c749a5b

adding wellington corpus and fixing some merge issues

639d76b

forgot for fix all merge conflicts

e48c0ef

removing some uneeded files

91ebb25

danpovey reviewed Apr 12, 2018

View reviewed changes

aarora8 reviewed Apr 13, 2018

View reviewed changes

added decode_gmm option. Removed unecessary file

750e11c

danpovey merged commit ccd50e2 into kaldi-asr:master Apr 13, 2018

LvHang pushed a commit to LvHang/kaldi that referenced this pull request Apr 14, 2018

[egs] Adding Wellington corpus for LM in IAM OCR (kaldi-asr#2354)

0d3e6bf

Skaiste pushed a commit to Skaiste/idlak that referenced this pull request Sep 26, 2018

[egs] Adding Wellington corpus for LM in IAM OCR (kaldi-asr#2354)

bb89717

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Iam_ocr Adding Wellington corpus#2354

[WIP] Iam_ocr Adding Wellington corpus#2354
danpovey merged 94 commits intokaldi-asr:masterfrom
ChunChiehChang:iam_ocr

ChunChiehChang commented Apr 12, 2018

Uh oh!

danpovey Apr 12, 2018

Uh oh!

ChunChiehChang Apr 12, 2018

Uh oh!

danpovey Apr 13, 2018

Uh oh!

aarora8 Apr 13, 2018 •

edited

Loading

Uh oh!

danpovey commented Apr 13, 2018

Uh oh!

danpovey commented Apr 13, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ChunChiehChang commented Apr 12, 2018

Uh oh!

danpovey Apr 12, 2018

Choose a reason for hiding this comment

Uh oh!

ChunChiehChang Apr 12, 2018

Choose a reason for hiding this comment

Uh oh!

danpovey Apr 13, 2018

Choose a reason for hiding this comment

Uh oh!

aarora8 Apr 13, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

danpovey commented Apr 13, 2018

Uh oh!

danpovey commented Apr 13, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

aarora8 Apr 13, 2018 •

edited

Loading